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Abstract 

This paper proposes a fast multi-band image fusion algorithm, which combines a high-spatial 
low-spectral resolution image and a low-spatial high-spectral resolution image. The well admitted 
forward model is explored to form the likelihoods of the observations. Maximizing the likelihoods 
leads to solving a Sylvester equation. By exploiting the properties of the circulant and downsampling 
matrices associated with the fusion problem, a closed-form solution for the corresponding Sylvester 
equation is obtained explicitly, getting rid of any iterative update step. Coupled with the alternating 
direction method of multipliers and the block coordinate descent method, the proposed algorithm can 
be easily generalized to incorporate prior information for the fusion problem, allowing a Bayesian 
estimator. Simulation results show that the proposed algorithm achieves the same performance as 
existing algorithms with the advantage of significantly decreasing the computational complexity of 
these algorithms. 


Index Terms 


Multi-band image fusion, Bayesian estimation, circulant matrix, Sylvester equation, alternating 
direction method of multipliers, block coordinate descent. 


I. Introduction 


A. Background 

In general, a multi-band image ean be represented as a three-dimensional data eube indexed 
by three exploratory variables {x, y, A), where x and y are the two spatial dimensions of the 
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scene, and A is the speetral dimension (eovering a range of wavelengths). Typieal examples 
of multi-band images inelude hyperspectral (HS) images lUl, multi-speetral (MS) images 
[|2l , integral field speetrographs Q, magnetic resonance speetroscopy images etc. However, 
multi-band imaging generally suffers from the limited spatial resolution of the data acquisition 
devices, mainly due to an unsurpassable tradeoff between spatial and speetral sensitivities 
a. For example, HS images benefit from excellent spectroseopie properties with hundreds 
of bands but are limited by their relatively low spatial resolution compared with MS and 
panchromatic (PAN) images (whieh are aequired in much fewer bands). As a consequenee, 
reconstrueting a high-spatial and high-spectral multi-band image from two degraded and 
complementary observed images is a ehallenging but crueial issue that has been addressed in 
various seenarios. In particular, fusing a high-spatial low-spectral resolution image and a low- 
spatial high-speetral image is an arehetypal instanee of multi-band image reconstruetion, sueh 
as pansharpening (MS-i-PAN) jSll or hyperspeetral pansharpening (HS-i-PAN) Generally, 
the linear degradations applied to the observed images with respeet to (w.r.t.) the target 
high-spatial and high-speetral image reduee to spatial and spectral transformations. Thus, the 
multi-band image fusion problem can be interpreted as restoring a three dimensional data- 
cube from two degraded data-eubes. A more precise deseription of the problem formulation 
is provided in the following paragraph. 


B. Problem Statement 

To better distinguish speetral and spatial degradations, the pixels of the target multi-band 
image, which is of high-spatial and high-spectral resolution, can be rearranged to build an 
m\ X n matrix X, where mx is the number of speetral bands and n = rirXnc is the number of 
pixels in eaeh band (rir and ric represents the number of rows and columns respectively). In 
other words, eaeh column of the matrix X consists of a m a- valued pixel and eaeh row gathers 
all the pixel values in a given speetral band. Based on this pixel ordering, any linear operation 
applied on the left (resp. right) side of X deseribes a speetral (resp. spatial) degradation. 

In this work, we assume that two complementary images of high-spectral or high-spatial 
resolutions, respeetively, are available to reeonstruct the target high-speetral and high-spatial 
resolution target image. These images result from linear speetral and spatial degradations of 
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the full resolution image X, aeeording to the well-admitted model 

Yl = LX + Nl 
Yr = XR + Nr 

where 

• X = [a:;i,..., G is the full resolution target image, 

• Yl G and Yr G are the observed speetrally degraded and spatially 

degraded images, 

• m = rrir X rric is the number of pixels of the high-speetral resolution image, 

• nx is the number of bands of the high-spatial resolution image, 

• Nl and Nr are additive terms that inelude both modeling errors and sensor noises. 

The noise matriees are assumed to be distributed according to the following matrix normal 
distributioni] 

Nl ~ .AAJ^T -^L) Im) 

Nr ~ nx,n(Onx,nj In)- 

Note that no particular structure is assumed for the row covariance matrices Al and Ar 
except that they are both positive definite, which allows for considering spectrally colored 
noises. Conversely, the column covariance matrices are assumed to be the identity matrix to 
reflect the fact that the noise is pixel-independent. 

In most practical scenarios, the spectral degradation L G only depends on the 

spectral response of the sensor, which can be a priori known or estimated by cross-calibration 
m- The spatial degradation R includes warp, translation, blurring, decimation, etc. As the 
warp and translation can be attributed to the image co-registration problem and mitigated by 
precorrection, only blurring and decimation degradations, denoted B and S are considered 
in this work. If the spatial blurring is assumed to be space-invariant, B G owns the 

specific property of being a cyclic convolution operator acting on the bands. The matrix 
S G is a d = dr X dc uniform downsampling operator, which has m = n/d ones on 

the block diagonal and zeros elsewhere, and such that S^S = Note that multiplying by 


'The probability density function p(X|M, Sr, Sc) of a matrix normal distribution AtAfr.cfM, Sr, Sc) is defined by 


p(X|M,Sr,Sc) 


exp (-itr [Sc ^(X - M)'^Sr ^(X - M)]) 
(27r)W2|Sc|'-/2|Srh/2 


where M G is the mean matrix, Sr G is the row covariance matrix and Sc G is the column covariance 

matrix. 
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represents zero-interpolation to inerease the number of pixels from m to n. Therefore, 
assuming R ean be deeomposed as R = BS G the fusion model O can be rewritten 

as 

Yr = XBS + Nr 

where all matrix dimensions and their respective relations are summarized in Table HI 

TABLE I 
Notations 


Notation 

Definition 

Relation 

Tllr 

row number of Yr 

TTl'p ^ Tlj- j d'f 

rric 

column number of Yr 

rUc = Uc/dc 

m 

number of pixels in each band of Yr 

m = TTir X rric 

Ur 

row number of Yl 

Hr = TTlr X dr 

Uc 

column number of Yl 

ric = rric x dc 

n 

number of pixels in each band of Yl 

n = rir X ric 

dr 

decimation factor in row 

II 

dc 

decimation factor in column 

dc = ndmc 

d 

decimation factor 

m = dr X dc 


This matrix equation ([T]) has been widely advocated in the pansharpening and HS pan- 
sharpening problems, which consist of fusing a PAN image with an MS or an HS image 
lf6l . (HI, (H- Similarly, most of the techniques developed to fuse MS and HS images also 
rely on a similar linear model IfTOl - lfT^ . From an applicative point of view, this problem is 
also important as motivated by recent national programs, e.g., the Japanese next-generation 
space-borne HS image suite (HISUI), which fuses co-registered MS and HS images acquired 
over the same scene under the same conditions m- 

To summarize, the problem of fusing high-spectral and high-spatial resolution images 
can be formulated as estimating the unknown matrix X from (21). There are two main 
statistical estimation methods that can be used to solve this problem. These methods are 
based on maximum likelihood (ML) or on Bayesian inference. ML estimation is purely data- 
driven while Bayesian estimation relies on prior information, which can be regarded as a 
regularization (or a penalization) for the fusion problem. Various priors have been already 
advocated to regularize the multi-band image fusion problem, such as Gaussian priors IfTSl . 
m, sparse representations ifT^ or total variation (TV) (20)1 priors. The choice of the prior 
usually depends on the information resulting from previous experiments or from a subjective 
view of constraints affecting the unknown model parameters (211 . (22)1 . 
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Computing the ML or the Bayesian estimators (whatever the form ehosen for the prior) is 
a ehallenging task, mainly due to the large size of X and to the presenee of the downsam¬ 
pling operator S, whieh prevents any direet use of the Fourier transform to diagonalize the 
blurring operator B. To overeome this diffieulty, several eomputational strategies have been 
designed to approximate the estimators. Based on a Gaussian prior modeling, a Markov ehain 
Monte Carlo (MCMC) algorithm has been implemented in [fT^ to generate a eollection of 
samples asymptotieally distributed aeeording to the posterior distribution of X. The Bayesian 
estimators of X ean then be approximated using these samples. Despite this formal appeal, 
MCMC-based methods have the major drawback of being computationally expensive, which 
prevents their effective use when processing images of large size. Relying on exactly the 
same prior model, the strategy developed in lfT9l exploits an alternating direction method of 
multipliers (ADMM) embedded in a block coordinate descent method (BCD) to compute the 
maximum a posterior (MAP) estimator of X. This optimization strategy allows the numerical 
complexity to be greatly decreased when compared to its MCMC counterpart. Based on a 
prior built from a sparse representation, the fusion problem is solved in llT^ . [|^ with 
the split augmented Lagrangian shrinkage algorithm (SALSA) [|2^ . which is an instance of 
ADMM. 

In this paper, contrary to the algorithms described above, a much more efficient method 
is proposed to solve explicitly an underlying Sylvester equation (SE) associated with the 
fusion problem derived from ([21). This solution can be implemented per se to compute the 
ML estimator in a computationally efficient manner. The proposed solution has also the 
great advantage of being easily generalizable within a Bayesian framework when considering 
various priors. The MAP estimators associated with a Gaussian prior similar to [fT^ . lfT9]l can 
be directly computed thanks to the proposed strategy. When handling more complex priors 
such as [fT^ . If20ll . the SE solution can be conveniently embedded within a conventional 
ADMM or a BCD algorithm. 

C. Paper organization 

The remaining of this paper is organized as follows. Section II studies the optimization 
problem to be addressed in absence of any regularization, i.e., in an ME framework. The 
proposed fast fusion method is presented in Section III and generalized to Bayesian estimators 
associated with various priors in Section IV. Section V presents experimental results assessing 
the accuracy and the numerical efficiency of the proposed fusion method. Conclusions are 
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finally reported in Seetion |Vll 


II. Problem Formulation 

Using the statistieal properties of the noise matriees Nl and Nr, Yr and Yr have matrix 
Gaussian distributions, i.e., 

p(Yl|X) = 7UAr„„„(LX,AL,IJ 
p(Yr|X) = 7UAA™,,^(XBS, Ar,I^). 

As the eolleeted measurements Yl and Yr have been aequired by different (possibly 
heterogeneous) sensors, the noise matrices Nl and Nr are sensor-dependent and can be 
generally assumed to be statistically independent. Therefore, Yr and Yr are independent 
conditionally upon the unobserved scene X. As a consequence, the joint likelihood function 
of the observed data is 


p (Yl, Yr|X) = p (Yl|X) p (Yr|X) . (4) 

Since adjacent HS bands are known to be highly correlated, the HS vector Xi usually lives 
in a subspace whose dimension is much smaller than the number of bands m\ l[24l . If25]l . 
i.e., X = HU where H is an orthogonal matrix such that H^H = 1^;, and U E is 

the projection of X onto the subspace spanned by the columns of H G 

Defining = {Yr, Yr} as the set of the observed images, the negative logarithm of the 
likelihood is 

- logp (^-lU) = - logp (Yl|U) - log p (Yr|U) + U 
= i tr ((Yr - HUBS)^ ArI (Yr - HUBS)) + 
i tr ((Yl - LHU)^ A^' (Yr - LHU)) + U 

where U is a constant. Thus, calculating the ML estimator of U from the observed images 
i.e., maximizing the likelihood can be achieved by solving the following problem 

argminL(U) (5) 

where 

L(U) = tr ((Yr - HUBS)^ ArI (Yr - HUBS)) + 
tr ((Yr - LHU)^ Al' (Yr - LHU)) . 
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Note that it is also obvious to formulate the optimization problem Q from the linear 
model dll) direetly in the least-squares (LS) sense If2^ . However, speeifying the distributions 
of the noises Al and Ar allows us to eonsider the ease of eolored noises (band-dependent) 
more easily by introducing the covariance matrices Ar and Al, leading to the weighting LS 
problem (|5]). 

In this paper, we prove that the minimization of © w.r.t. the target image U can be 
solved analytically, without any iterative optimization scheme or Monte Carlo based method. 
The resulting closed-form solution to the optimization problem is presented in Section ITm 
Furthermore, it is shown in Section |IV] that the proposed method can be easily generalized 
to Bayesian fusion methods with appropriate prior distributions. 

III. Fast Fusion Scheme 

A. Sylvester equation 

Minimizing ® w.r.t. U is equivalent to force the derivative of L(U) to be zero, i.e., 
dL(U)/dU = 0, leading to the following matrix equation 

H^Ar^HUBS (BS)^ + (^(LH)^Al^Lh) u 

= H^Ar'Yr(BS)^ + (LH)^AlIYl. 

As mentioned in Section iLBl the difficulty for solving ® results from the high dimensionality 
of U and the presence of the downsampling matrix S. In this work, we will show that Eq. 
® can be solved analytically with some assumptions summarized below. 

Assumption 1. The blurring matrix B is a circulant matrix. 

A consequence of this assumption is that B can be decomposed as B = FDF^ and 
B^ = FD*F^, where F G is the discrete Fourier transform (DFT) matrix (FF^ = 
F^F = I„), D G l^nxn ^ diagonal matrix and * represents the conjugate operator. 

Assumption 2. The decimation matrix S corresponds to downsampling the original signal 
and its conjugate transpose interpolates the decimated signal with zeros. 

A decimation matrix satisfies the property S^S = I^- Moreover, the matrix S = SS^ G 
I^nxn symmetric and idempotent, i.e., S = and SS ^ = = S. For a practical imple¬ 

mentation, multiplying an image by S can be achieved by doing entry-wise multiplication 
with an n X n mask matrix with ones in the sampled position and zeros elsewhere. 
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Note that the assumptions w.r.t. the blurring matrix B and the deeimation matrix S have 
been widely used in image proeessing applieation, like super-resolution ll^ . Il28]l . fusion 
El, Hal, etc. 

After multiplying ® on both sides by we obtain 

CiU + UC2 = C3 (7) 

where 

Cl = (H"Aj;‘H)'‘ ((lh)"a^‘lh) 

C2 = BSB^ 

C3 = (H^ArIh)"' (h^Ar'Yr (BS)^ + (LH)^Ai: 1 Yl) . 

Eq. dV]) is a generalized Sylvester matrix equation [l2l . It is well known that a SE has a 
unique solution if and only if an arbitrary sum of the eigenvalues of Ci and C 2 is not equal 
to zero [| 23 - 

B. Existence of a solution 

In this section, we study the eigenvalues of Ci and C 2 to check if dU) has a unique solution. 
As matrix C 2 = BSB^ is positive semi-definite, its eigenvalues include positive values and 
zeros Ol . In order to study the eigenvalues of Ci, Lemma [His introduced below. 

Lemma 1. If matrix Ai G is symmetric (resp. Hermitian) positive definite and matrix 
A 2 G is symmetric (resp. Hermitian) positive semi-definite, the product A 1 A 2 is 

diagonalizable and all the eigenvalues 0 /A 1 A 2 are non-negative. 

Proof: See proof in Appendix ■ 

According to Lemma [H since the matrix Ci is the product of a symmetric positive definite 
matrix ^ and a symmetric semi-definite matrix (LH)^Al^LH, it is diagonaliz¬ 

able and all its eigenvalues are non-negative. As a consequence, the eigen-decomposition of 
Cl can be expressed as Ci = QAiQ~^, where Ai = diag (Ai, • • • , (diag (Ai, • • • , A^^) 
is a diagonal matrix whose elements are Ai, • • • , and Aj > 0, Vz. Therefore, as long as 
zero is not an eigenvalue of Ci (or equivalently Ci is invertible), any sum of eigenvalues of 
Cl and C 2 is different from zero (more accurately, this sum is > 0), leading to the existence 
of a unique solution of dU)- 

^The invertibility of the matrix is guaranteed since H has full column rank and Ar is positive dehnite. 
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However, the invertibility of Ci is not always guaranteed depending on the forms and 
dimensions of H and L. For example, if n\ < m\, meaning the number of MS bands is 
smaller than the subspaee dimension, the matrix (LH)^Al^LH is rank defieient and thus 
there is not a unique solution of ([7]). In eases when Ci is singular, a regularization or prior 
information is neeessary to be introdueed to ensure (|7]) has a unique solution. In this seetion, 
we focus on the case when Ci is non-singular. The generalization to Bayesian estimators 
based on specific priors already considered in the literature will be elaborated in Section UVl 

C. A classical algorithm for the Sylvester matrix equation 

A classical algorithm for obtaining a solution of the SE is the Bartels-Stewart algorithm 
lf29]l . This algorithm decomposes Ci and C 2 into Schur forms using a QR algorithm and 
solves the resulting triangular system via back-substitution. However, as the matrix C 2 = 
BSB^ is very large for our application (n x n, where n is the number of image pixels), 
it is unfeasible to construct the matrix C 2 , let alone use the QR algorithm to compute its 
Schur form (which has the computational cost Ofm’) arithmetical operations). The next 
section proposes an innovative strategy to obtain an analytically expression of the SE ([7]) 
by exploiting the specific properties of the matrices Ci and C 2 associated with the fusion 
problem. 

D. Proposed closed-form solution 

Using the decomposition Ci = QAcQ“^ and multiplying both sides of (|7]) by leads 
to 

AcQ~^U + Q'1UC2 = Q-'Cs. (8) 

Right multiplying ([8]) by FD on both sides and using the definitions of matrices C 2 and B 
yields 

AiQ-^UFD + Q“^UFD (F^SFD) = Q-^CgFO ( 9 ) 

where D = (D*)D is a real diagonal matrix. Note that UFD = UBF G can be 

interpreted as the Fourier transform of the blurred target image, which is a complex matrix. 
Eq. ® can be regarded as a SE w.r.t. Q”^UFD, which has a simpler form compared to (|7]) 
as Ac is a diagonal matrix. 

The next step in our analysis is to simplify the matrix F^SFD appearing on the left hand 
side of dH). First, it is important to note that the matrix F^SFD has a specific structure since 
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all its columns contain the same bloeks (see Eq. (l29l) in Appendix [B]). Using this property, 
by multiplying left and right by speeific matriees, one obtains a bloek matrix whose nonzero 
blocks are loeated in its first (block) row (see (fT^ l. More preeisely, introduce the following 
matrix 


P = 


Ir 


0 


"Im Im 


-Im 0 


d 


whose inversqj ean be easily eomputed 


0 

0 


( 10 ) 


pi = 


Im 0 


Im Im 


Im 0 


d 


Right multiplying both sides of ([91) by P i leads to 


0 

0 


Ac7U + UM = C3 (11) 

where tj = Q-iUFDP-i, M = P (F^SFD) P~i and C 3 = Q“iC 3 FDP“i. Eq. (E]) is a 
Sylvester matrix equation w.r.t. tj whose solution is signifieantly easier than for ([ 8 ]) because 
the matrix M has a simple form outlined in the following lemma. 


Lemma 2. The following equality holds 


M = P (F^SFD) P-^ = 


EE. D2 

2=1 


D. 


0 0 0 


0 0 0 


( 12 ) 


^Note that left multiplying a matrix by P corresponds to subtracting the first row blocks from all the other row blocks. 
Conversely, right multiplying by the matrix means replacing the first (block) column by the sum of all the other (block) 
columns. 
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where the matrix D has been partitioned as follows 


D 


Di 0 0 

0 D 2 0 

0 0 ■■■ D. 


with Dj m X m real diagonal matrices. 

Proof: See Appendix |Bj ■ 

Finally, using the speeifie form of M provided by Lemma [2l the solution tj of the SE 
(fTTI) ean be eomputed bloek-by-bloek as stated in the following theorem. 

Theorem 1. Let { 03)1 j denotes the jth block of the Ith band of C 3 for any / = 1, • • • , ffix- 
Then, the solution U of the SE (fTTI) can be decomposed as 


U = 


U14 Ui_2 • • • 

U2,l U2,2 • ■ ■ U2,(i 






(13) 


with 


= 


(CakdjEa + A!,/, 


2=1 


-1 


i [(C3)l,, - , 


, J = 1, 

J = 2, • • • , d. 


(14) 


Proof: See Appendix O 


Note that u; j G 


Dlxm 


denotes the jth bloek of the Zth band. Note also that the matrix 


^ ^ Dj + appearing in the expression of u;_i is an n x n rea 


2=1 


inversion is trivial. The final estimator of X is obtained as follows 


diagonal matrix whose 


X = HQUPD'^F^. 


Algo. [H summarizes the steps required to ealeulate the estimated image X. 


(15) 


'’it may happen that the diagonal matrix D does not have full rank (containing zeros in diagonal) or is ill-conditioned 
(having very small numbers in diagonal), resulting from the property of blurring kernel. In this case, can be replaced 
by (D -b for regularization purpose, where r is a small penalty parameter El- 
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Algorithm 1: Fast Fusion of Multi-band Images 
Input: Yl, Yr, Al, Ar, L, B, S, H 

// Circulant matrix decomposition: B = FDF”^ 

1 D ^ Dec (B); 

2 D = D*D; 


/ / Calculate Ci 

3 Cl ^ (H^Ar^H)"^ ((LH)^Al^Lh); 

// Eigen-decomposition of Ci; Cl = QAcQ ^ 

4 (Q, Ac) ^ EigDec(Ci); 

/ / Calculate C 3 

5 C 3 ^ Q-' (H^Ar'H)"' (H^ArIYr(BS)^ +(LH)^A-1Yl)BFP"1;^ 

/ / Calculate U block by block (d blocks) and band by band (m\ bands) 

6 for / = 1 to rhx do 

/ / Calculate the 1st block In Ith band 

uz,i = (C3)u f^ED, + A'cIi 
V i=l 


8 
9 

10 

11 end 


/ / Calculate other blocks in Ith band 

for j = 2 to d do 

I Uu = ^((C3)z,i-KiDi); 

end 


12 Set X = HQUPD^^F^; 

Output: X 


E. Complexity Analysis 

The most computationally expensive part of the proposed algorithm is the computation 
of matrices D and C3 because of the FFT and iFFT operations. Using the notation C4 = 
Q-‘ (H"AS' H) \ the matrix C3 can be rewritten 

C3 = C4 (h^Ar'Yr (BS)^ + (LH)^A-'Yl) bfp -1 

) ^ N (16) 

= C4 (H^ArIYrS^FD* + (LH)^ A-^YlF) DP^i. 

The most heavy step in computing (fT^ is the decomposition B = FDF^ (or equivalently 
the FFT of the blurring kernel), which has a complexity of order 0{n log n). The calculations 
of H^Aj^^YrS^FD* and (LH)^ Al^YlF require one FFT operation each. All the other 
computations are made in the frequency domain. Note that the multiplication by DP~^ has 
a cost of 0{n) operations as D is diagonal, and P“^ reduces to block shifting and addition. 
The left multiplication with (H^Aj^^H) ^ is of order 0{rri\n). Thus, the calculation 
of C 3 BFP"^ has a total complexity of order 0 {n ■ max {logn, m|}). 


DRAFT 


February 12, 2015 






13 


IV. Generalization to Bayesian estimators 

As mentioned in Seetion ITTT-Al if the matrix BSB^ is singular or ill-eonditioned (e.g., 
when the number of MS bands is smaller than the dimension of the subspaee spanned by the 
pixel veetors, i.e., n\ < m\), a regularization or prior information p (U) has to be introdueed 
to ensure the Sylvester matrix equation (fTTl) has a unique solution. The resulting estimator U 
ean then be interpreted as a Bayesian estimator. Combining the likelihood (@1) and the prior 
p (U), Bayes theorem provides the posterior distribution of U 

p(U\^) oc p{^\V)p(U) 

cx p(Yl|U)p(Yr|U)p(U) 

where oc means “proportional to” and where we have used the independenee between the 
observation veetors Yl and Yr. 

The mode of the posterior distribution p (U|^') is the so-ealled MAP estimator whieh ean 
be obtained by solving the following optimization problem 

argminL(U) (17) 

where 

L(U) = i tr ((Yr - HUBS)'^ A^' (Yr - HUBS)) + 
itr ((Yl - LHU)^ Al' (Yr - LHU)) - logp(U). 

Different Bayesian estimators eorresponding to different ehoiees of p(U) have been eonsid- 
ered in the literature. These estimators are first reealled in the next seetions. We will then 
show that the explieit solution of the SE derived in Seetion |III] ean be used to eompute the 
MAP estimator of U for these prior distributions. 

A. Gaussian prior 

Gaussian priors have been used widely in image proeessing lf^ - lf34]l . and ean be inter¬ 
preted as a Tikhonov regularization ll^ . Assume that a matrix normal distribution is assigned 
a priori to the projeeted target image U 

p(U) = S, I„) (19) 

where and S are the mean and eovarianee matrix of the matrix normal distribution. Note 
that the eovarianee matrix S explores the eorrelations between HS band and eontrols the 
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distance between U and its mean fi. Foreing the derivative of i^(U) in (fTVl) to be zero leads 
to the following SE 

CiU + UC 2 = C 3 (20) 


where 

Cl = 

C 2 = BSB^ 

( 21 ) 

C 3 = (H^Aj^ih) '(H^Ar1Yr(BS)^ + 

(LH)^Al1Yl + S“V)- 

The matrix Ci is positive definite as long as the eovarianee matrix is positive definite. 
Algo. [H ean thus be adapted to a matrix normal prior ease by simply replaeing Ci and C 3 
by their new expressions defined in ( 1211 ) . 


B. Non-Gaussian prior 

The objective funetion T(U) in (fT71) can be split into a data term /(U) eorresponding to 
the likelihood and a regularization term (/>(U) eorresponding to the non-Gaussian prior in a 
Bayesian framework as 

L(U) = /(U) + 0(U) (22) 

where 

/(U) = \ tr ((Yr - HUBS)^ ArI (Yr - HUBS)) 

+ i tr ((Yl - LHU)^ Al^ (Yl - LHU)) 

and 

0(U) = -logp(U). 

The optimization of (l22l) w.r.t. U ean be solved efficiently by using an ADMM that consists 
of two steps: 1) solving a surrogate optimization problem assoeiated with a Gaussian prior 
and 2) applying a proximity operator If^ . This strategy can be implemented in the image 
domain or in the frequeney domain. The resulting algorithms, named as SE-within-ADMM 
(SE-ADMM) methods, are deseribed below. 

1) Solution in image domain: Eq. (l22l) ean be rewritten as 

L(U,V) = /(U) + 0(V) s.t. U = V. 
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The augmented Lagrangian assoeiated with this problem is 

V, A) = /(U) + 0(V) + A^(U - V) + |||U - V||^ (23) 

or equivalently 

L^(U, V, W) = /(U) + 0(V) + |||U - V - Will (24) 

where W is the sealed dual variable. This optimization problem ean be solved by an ADMM 

as follows 

(Ufc+i^V'=+^) = argmin/(U) + ())(V) + 

f||U-V-w^lll 
y^k+i ^ — v^+^). 

More speeifieally, the updates of the derived ADMM algorithm are 

U^+i =argmin/(U) + f||U-V^-W^||| 

yk+i ^ prox^^^(u^+i - W^) (25) 

yi^k+i ^ — V^+^). 

• Update U: Instead of using any iterative update method, the optimization w.r.t. U ean 

be solved analytically by using Algo. [T] as for the Gaussian prior investigated in Section 
IIV-AI For this, we can set /x = and = /ilm^ i^i (l2TI) . However, the 

computational complexity of updating U in each iteration is C>(n log n) because of the 
FFT and iFFT steps required for computing C 3 and U from U. 

• Update V: The update of V requires computing a proximity operator, which depends on 
the form of 0(V). When the regularizer 0(V) is simple enough, the proximity operator 
can be evaluated analytically. For example, if 0(V) = ||V||i, then 

prox^_^(U^+^ - W^) = soft - W^ 

where soft is the soft-thresholding function defined as 

soft(fif, r) = sign( 5 () max{\g\ - r, 0 ). 

More examples of proximity computations can be found in O^ . 

• Update W: The update of W is simply a matrix addition whose implementation has a 
small computational cost. 
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2) Solution in frequency domain: Recalling that B = FDF^, a less computationally 
expensive solution is obtained by rewriting Iv(U) in (l22l) as 

L(W, V) = f{U) + 0(V) s.t. U = V 

where lA = UF is the Fourier transform of U and 

f{U) = ^ tr ((Yr - HWDF^S)^Ar1(Yr - HWDF^S)) 

+ itr ((Yl - LHWF^)^ Al' (Yl - LHWF^)) 
and 

0(V) = -logp(V). 

Thus, the ADMM updates, defined in the image domain by (|25]) . can be rewritten in the 
frequency domain as 

yk+i = argmin/(W) + f||W- 

lA 

yfc+i _ prox^^^(wfc+i - yyfc) (26) 

ypfc+i = ypfc _ _ yfc+i)_ 

At the {k + l)th ADMM iteration, updating hi can be efficiently conducted thanks to a SE 
solver similar to Algo. [H where the matrix C 3 is defined by 

C 3 = C, + Cc (V^ + W’^) (27) 

with 

C, = (H^Ar^H)”^ (H^Ar^YrS^FD^ 

+ (LH)^A-^YlF)DP“^ 

Cc = (H^Ar^H)”^S-^ 

Note that the update of C3 does not require any EFT computation since and Cc can be 
calculated once and are not updated in the ADMM iterations. 

C. Hierarchical Bayesian framework 

When using a Gaussian prior, a hierarchical Bayesian framework can be constructed by 
introducing a hyperprior to the hyperparameter vector $ = {/x, S}. Several priors have been 
investigated in the literature based on generalized Gaussian distributions, sparsity-promoted 
hi or fo regularizations, (.2 smooth regularization, or TV regularization. Denoting as p($) 
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the prior of the optimization w.r.t. U ean be replaeed by an optimization w.r.t. (U, $) as 
follows 

(U,#) = argmaxp 

u,$ 

= argmaxp(YL|U)p(YR|U)p(U|$)p(#). 
u,^ 

A standard way of solving this problem is to optimize alternatively between U and $ using 
the following updates 

U^+i = argmaxp(YL|U)p(YR|U)p(U|$^) 

= argmaxp p ($). 

The update of ean be solved using Algo. [U whereas the update of # depends on the 
form of the hyperprior p ($). The derived optimization method is named as a SE-within-bloek 
eoordinate descent (SE-BCD) method. 

It is interesting to note that the strategy of Section IIV-BI proposed to handle the case of a 
non-Gaussian prior can be interpreted as a special case of a hierarchical updating. Indeed, if 
we interpret V + d and (l24l) as the mean and covariance matrix S, the ADMM 

update (1251) can be considered as the iterative updates of U and /x = V + d with fixed 

s = 


V. Experimental results 

This section applies the proposed fusion method to three kinds of priors that have been 
investigated in [fT^ . IfT^ and If20l for the fusion of multi-band images. Note that these three 
methods require to solve a minimization problem similar to (fT71) . All the algorithms have 
been implemented using MATEAB R2013A on a computer with Intel(R) Core(TM) 17-2600 
CPU@3.40GHz and 8GB RAM. The MATEAB codes and all the simulation results are 
available in the first author’s homepagqf. 


A. Fusion Quality Metrics 

To evaluate the quality of the proposed fusion strategy, five image quality measures have 
been investigated. Referring to IfT^ . 0711 . we propose to use the restored signal to noise 
ratio (RSNR), the averaged spectral angle mapper (SAM), the universal image quality index 
(UIQI), the relative dimensionless global error in synthesis (ERGAS) and the degree of 


'http://wei.perso.enseeiht.fr/ 
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Fig. 1. HS+MS fusion results. Top: HS image (1st column), MS image (2nd column) and reference image (3rd column). 
Middle and bottom: state-of-the-art-methods and corresponding proposed fast fusion methods, respectively, with various 
regularizations: supervised naive Gaussian prior (1st column), unsupervised naive Gaussian prior (2nd column), sparse 
representation (3rd column) and TV (4th column). 


distortion (DD) as quantitative measures. The RSNR is defined by the negative logarithm of 
the distanee between the estimated and reference images. The larger RSNR, the better the 
fusion. The definition of SAM, UIQI, ERGAS and DD can be found in ifT^ . The smaller 
SAM, ERGAS and DD, the better the fusion. The larger UIQI, the better the fusion. 

B. Fusion of HS and MS images 

The reference image considered here as the high-spatial and high-spectral image is a 
256 X 128 X 93 HS image acquired over Pavia, Italy, by the reflective optics system imaging 
spectrometer (ROSIS). This image was initially composed of 115 bands that have been 
reduced to 93 bands after removing the water vapor absorption bands. A composite color 
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image of the seene of interest is shown in the top right of Fig. [H 

Our objeetive is to reeonstruet the high-spatial high-speetral image X from a low-spatial 
high-speetral HS image Yr and a high-spatial low-speetral MS image Yl- First, Yr has 
been generated by applying a 5 x 5 averaging filter and by down-sampling every 4 pixels 
in both vertieal and horizontal direetions for eaeh band of the referenee image. Seeond, a 
4-band MS image Yl has been obtained by filtering X with the LANDS AT-like reflectanee 
speetral responses ll^ . The HS and MS images are both eontaminated by zero-mean additive 
Gaussian noises. Our simulations have been eondueted with SNRr,* = 35dB for the first 43 
bands of the HS image and SNRr,* = 30dB for the remaining 50 bands, with 



For the MS image 



for all speetral bands. 

The observed HS and MS images are shown in the top left and middle figures of Fig. [TJ 
Note that the HS image has been interpolated for better visualization and that the MS image 
has been displayed using an arbitrary color composition. The subspace transformation matrix 
H has been defined as the PC A following the strategy of ifT^ . 

1) Example 1: HS+MS fusion with a naive Gaussian prior: We first consider a Bayesian 
fusion model initially proposed in llT^ . This method assumed a naive Gaussian prior for the 
target image, leading to an £ 2 -regularization of the fusion problem. The mean of this Gaussian 
prior was fixed to an interpolated HS image. The covariance matrix of the Gaussian prior 
can be fixed a priori (supervised fusion) or estimated jointly with the unknown image within 
a hierarchical Bayesian method (unsupervised fusion). The estimator studied in [fT^ was 
based on a hybrid Gibbs sampler generating samples distributed according to the posterior 
of interest. An ADMM step embedded in a BCD method (ADMM-BCD) was also proposed 
in |1T9]| providing a significant computational cost reduction. This section compares the 
performance of this ADMM-BCD algorithm with the performances of the SE-based methods 
for these fusion problems. 

For the supervised case, the explicit solution of the SE can be constructed directly following 
the Gaussian prior-based generalization in Section IIV-AI Conversely, for the unsupervised 
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TABLE II 

Performance oe the fusion methods: RSNR (in dB), UIQI, SAM (in degree), ERGAS, DD (in 10“®) and 

TIME (IN SECOND). 


Regularization 

Methods 

RSNR 

UIQI 

SAM 

ERGAS 

DD 

Time 

supervised 

ADMM 1231 

29.295 

0.9906 

1.556 

0.892 

7.121 

88.58 

naive Gaussian 

Proposed SE 

29.366 

0.9908 

1.552 

0.880 

7.094 

0.57 

unsupervised 

ADMM-BCD [21 

29.078 

0.9902 

1.595 

0.914 

7.279 

53.90 

naive Gaussian 

Proposed SE-BCD 

29.076 

0.9902 

1.624 

0.914 

7.369 

1.07 

sparse 

ADMM-BCD (211 

29.575 

0.9912 

1.474 

0.860 

6.833 

96.09 

representation 

Proposed SE-BCD 

29.600 

0.9913 

1.472 

0.856 

6.820 

23.72 

TV 

ADMM (lOl 

29.470 

0.9911 

1.503 

0.861 

6.923 

138.98 

Proposed SE-ADMM 

29.626 

0.9914 

1.478 

0.846 

6.795 

93.94 


case, the generalized version denoted SE-BCD and deseribed in Section IIV-CI is exploited, 
whieh requires embedding the elosed-form solution into a BCD algorithm (refer lfT9ll for more 
details). The estimated images obtained with the different algorithms are depieted in Fig. [T] 
and are visually very similar. More quantitative results are reported in Table |n] and eonfirm 
the similar performance of these methods in terms of the various fusion quality measures 
(RSNR, UIQI, SAM, ERGAS and DD). However, the eomputational time of the proposed 
algorithm is reduced by a faetor larger than 100 (supervised) and 50 (unsupervised) due to 
the existenee of a elosed-form solution for the Sylvester matrix equation. 

2) Example 2: HS+MS fusion with a sparse representation: This section investigates a 
Bayesian fusion model based on a sparse representation introdueed in ifT^ . The basie idea 
of this approaeh was to design a prior that results from the sparse deeomposition of the 
target image on a set of dictionaries learned empirieally. Some parameters needed to be 
adjusted by the operator (regularization parameter, dietionaries and supports) whereas the 
other parameters (sparse eodes) were jointly estimated with the target image. In [fT^ . the 
MAP estimator assoeiated with this model was reaehed using an optimization algorithm that 
eonsists of an ADMM step embedded in a BCD method (ADMM-BCD). Using the strategy 
proposed in Seetion lTV-Ci this ADMM step ean be avoid by exploiting the SE solution. Thus, 
the performance of the ADMM-BCD algorithm in [fT^ is eompared with the performance 
of the SE-BCD seheme as deseribed in Seetion llV-Ci As shown in Fig. [T] and Table HIl the 
performanee of both algorithms is quite similar. However, the proposed solution exhibits a 
signifieant eomplexity reduction. 
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3) Example 3: HS+MS Fusion with TV regularization: The third experiment is based on a 
TV regularization (ean be interpreted as a specific instance of a non-Gaussian prior) studied in 
EOll . The regularization parameter of this method needs to be fixed by the user. The ADMM- 
based method investigated in GOll requires to compute a TV-based proximity operator (which 
increases the computational cost when compared to the previous algorithms). To solve this 
optimization problem, the frequency domain SE solution derived in Section IIV-B2I can be 
embedded in an ADMM algorithm. The fusion results obtained with the ADMM method of 
If20ll and the proposed SE-ADMM method are shown in Eig. [T] and are quite similar. Table 
ini confirms this similarity more quantitatively by using the quality measures introduced in 
Section IV-AI Note that the computational time obtained with the proposed explicit fusion 
solution is reduced when compared to the ADMM method. In order to complement this 
analysis, the convergence speeds of the SE-ADMM algorithm and the ADMM method of 
EOll are studied by analyzing the evolution of the objective function for the two fusion 
solutions. Eig. [2] shows that the SE-ADMM algorithm converges faster at the starting phase 
and gives smoother convergence result. 



Fig. 2. Convergence speed of the ADMM 1201 and the proposed SE-ADMM with the TV-regularization. 


C. Hyperspectral Pansharpening 

When n\ = 1, the fusion of HS and MS images reduces to the HS pansharpening 
(HS-i-PAN) problem, which is the extension of conventional pansharpening (MS-i-PAN) and 
has become an important and popular application in the area of remote sensing @ . In order to 
show that the proposed method is also applicable to this problem, we conducted HS and PAN 
image fusion for another HS dataset. The reference image, considered here as the high-spatial 
and high-spectral image, is an hyperspectral image acquired over Moffett field, CA, in 1994 
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Fig. 3. Hyperspectral pansharpening results. 1st column: HS image. 2nd column: PAN image. 3rd column: Reference 
image. 4th column: ADMM (m. 5th column: Proposed method. 


by the JPL/NASA airborne visible/infrared imaging speetrometer (AVIRIS) OUl. This image 
was initially composed of 224 bands that have been reduced to 176 bands after removing the 
water vapor absorption bands. The HS image has been generated by applying a 5 x 5 averaging 
filter on each band of the reference image. Besides, a PAN image is obtained by successively 
averaging the adjacent bands in visible bands (1~41 bands) according to realistic spectral 
responses. In addition, the HS and PAN images have been both contaminated by zero-mean 
additive Gaussian noises. The SNR of the HS image is 35dB for the first 126 bands and 
30dB for the last remaining bands. 

A naive Gaussian prioiO is chosen to regularize this ill-posed inverse problem. The SE based 
method is compared with the ADMM method of [[T9]| to solve the supervised pansharpening 
problem (i.e., with fixed hyperparameters). The results are displayed in Fig. [3] whereas 
more quantitative results are reported in Table |nll Again, the proposed SE-based method 
provides similar qualitative and quantitative fusion results with a significant computational 
cost reduction. 


VI. Conclusion 

This paper developed a fast multi-band image fusion method based on an explicit solution 
of a Sylvester equation. This method was applied to both the fusion of multispectral and hy¬ 
perspectral images and to the fusion of panchromatic and hyperspectral images. Coupled with 


®Due to space limitation, only the Gaussian prior of GD is considered in this experiment. However, additional simulation 
results for other priors are available in the technical report Gq). 
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TABLE III 

Performance of the Pansharpening methods: RSNR (in dB), UIQI, SAM (in degree), ERGAS, DD (in 

10“^) AND TIME (IN SECOND). 


Methods 

RSNR UIQI SAM ERGAS DD Time 

ADMM [m 

18.616 0.9800 4.937 3.570 1.306 98.12 

Proposed SE 

18.652 0.9802 4.939 3.555 1.302 0.54 


the alternating direetion method of multipliers and bloek eoordinate deseent, the proposed 
algorithm can be easily generalized to compute Bayesian estimators for the fusion problem. 
Besides, the analytically solution can be embedded in block coordinate descent algorithm 
to solve the hierarchical Bayesian estimation problem. Numerical experiments showed that 
the proposed fast fusion method compares competitively with the ADMM based methods, 
with the advantage of reducing the computational complexity significantly. Future work will 
consist of incorporating learning of the subspace transform matrix H into the fusion scheme. 
Implementing the proposed fusion scheme in real datasets will also be interesting. 

Appendix A 
Proof of Lemma [I] 

As Ai is symmetric (resp. Hermitian) positive definite, Ai can be decomposed as Ai = 

i i i 

A^A^, where A^ is also symmetric (resp. Hermitian) positive definite thus invertible. 
Therefore, we have 

AiA 2 = Af (^Af A 2 Af) A^i (28) 

1 11 

As Af and A 2 are both symmetric (resp. Hermitian) matrices, A^ A 2 A^ is also a symmetric 

(resp. Hermitian) matrix that can be diagonalized. As a consequence, A 1 A 2 is similar to a 

diagonalizable matrix, and thus it is diagonalizable. 

11 1 

Similarly, A 2 can be written as A 2 = AfAf, where Af is positive semi-definite. Thus, 

1 1 1111 

Af A 2 A^ = A^ Af Af A^ is positive semi-definite showing that all its eigenvalues are non¬ 
negative. As similar matrices share equal similar eigenvalues, the eigenvalues of A 1 A 2 are 
non-negative. 


Appendix B 
Proof of Lemma [2] 

First, we introduce the following lemma. 
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Lemma 3. The following equality holds 

F^SF = hd®Im 

d 

where F and S are defined as in Section I///-AI is the d x d matrix of ones and Im is the 
m X m identity matrix. 


Proof: See Appendix |Dj 
Aeeording to Lemma [3l we have 


F^SFD = i(J,®IjD = i 


D2 • ■ ■ Dq 


(29) 


D2 ■ ■ ■ 

Thus, multiplying (l29l) by P on the left side and by on the right side leads to 

M = P (F^SFD) P-1 

D j D2 ■ • ■ 


0 0 


0 0 


0 


0 


ED, D 2 

i=l 

0 0 


0 0 


D, 

0 

0 


Appendix C 
Proof of Theorem [I] 


Substituting (fT^ and (fT3]) into (fTTI) leads to (I3T]) . where 



(^3)1,1 

(03)1,2 • 

• ( 03 )l,d 

(03)2,1 

(03)2,2 ■ 

■ (03)2,11 


(C 3 )rf ,2 ■ 

• (Os)^,^ _ 


(30) 


Identifying the first (bloek) eolumns of (l3TI) allows us to eompute the element ui 1 for 
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Ou 3 E a + Ail 


2 = 1 
d 


02,1 3 E a + Ap, 


2 = 1 


AcUi,2 + ^Ui,iD2 
AcU 2,2 + ^U 2 ,iD 2 


2 = 1 


UmA,l ( ^ S Dj + Xc^ln ] 


A^Ui,rf + iui,iD, 


AcU2,d + dU2,lD^ 


A(;; iD^ 


= C., 


(31) 


I = 1, ...,d as follows 


-1 


ou = (C3),,, (-5^a+A‘,i, 


2=1 


for / = 1, • • • , mx- Using the values of u; i determined above, it is easy to obtain 2 , • • 
as 


) ^l,d 


U .2 = 3 ;r 


C L 


(C3)iy - 


for / = 1, • • • , m\ and j = 2, • • • , d. 


Appendix D 
Proof of Lemma [3] 

The n dimensional DFT matrix F can be written explicitly as follows 



1 

1 

1 

1 

1 


1 

to 

a;2 

• 

LO^-^ 

1 

1 


to^ 

a;6 . 

. a;2(n-l) 

v^n 

1 

u;3 

a;6 


(j3(n-l) 


1 



^3(n-l) . 



where oj = e is a primitive nth root of unity in which i = \/—1. The matrix S can also 
be written as follows 

s = El + Ei_|_£; + ■ ■ ■ + Ei+(m-l)d 
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where Ej G is a matrix containing only one non-zero element equal to 1 located at the 
i\h row and i\h column as follows 


E. = 


0 0 0 


0 0 0 


It is obvious that Ej is an idempotent matrix, i.e., Ej = E^. Thus, we have 


F^EjF 


{EiFfEiF= ■ 



0 



where fj = [l ce® ^ ^^^® is the ith row of the matrix F and 

0 g l^ixn is (^j^g 2 ero vector of dimension 1 x n. Straightforward computations lead to 


1 


u 


2—1 





1 




^-(i-l)(n-2) 


^(i-l)(n-2) 

1 
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n 

Using the cu’s property ^ cu* = 0 and n = md leads to 

ff fl + ^l+d^l+d H-fi^(m_i)rffl+(m-l)d 


m 

0 

■ o’ 


m 

0 ■■ 

• o’ 

0 

m ■ 

■ 0 


0 

m ■ 

■ 0 

_0 

0 ■■ 

• m 


_0 

0 ■■ 

• m 


I 

n 


m 

0 

■ o’ 


m 

0 ■ 

• o' 

0 

m ■ 

■ 0 


0 

m ■ 

■ 0 

_0 

0 

■ m 


_0 

0 • 

■ m 


■m ■ ■ ■ Am 


1 

d 


d^^ ® 
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